[Next] [Art
of Assembly][Randall Hyde]
Art of Assembly Language: Chapter Fifteen
- Chapter 15 - Strings and Character Sets
- 15.0 - Chapter Overview
- 15.1 - The 80x86 String Instructions
- 15.1.1 - How the String Instructions Operate
- 15.1.2 - The REP/REPE/REPZ and REPNZ/REPNE
Prefixes
- 15.1.3 - The Direction Flag
- 15.1.4 - The MOVS Instruction
- 15.1.5 - The CMPS Instruction
- 15.1.6 - The SCAS Instruction
- 15.1.7 - The STOS Instruction
- 15.1.8 - The LODS Instruction
- 15.1.9 - Building Complex
String Functions from LODS and STOS
- 15.1.10 - Prefixes and the
String Instructions
- 15.2 - Character Strings
- 15.2.1 - Types of Strings
- 15.2.2 - String Assignment
- 15.2.3 - String Comparison
- 15.3 - Character String Functions
- 15.3.1 - Substr
- 15.3.2 - Index
- 15.3.3 - Repeat
- 15.3.4 - Insert
- 15.3.5 - Delete
- 15.3.6 - Concatenation
- 15.4 - String Functions in the
UCR Standard Library
- 15.4.1 - StrBDel, StrBDelm
- 15.4.2 - Strcat, Strcatl, Strcatm,
Strcatml
- 15.4.3 - Strchr
- 15.4.4 - Strcmp, Strcmpl,
Stricmp, Stricmpl
- 15.4.5 - Strcpy, Strcpyl,
Strdup, Strdupl
- 15.4.6 - Strdel, Strdelm
- 15.4.7 - Strins, Strinsl,
Strinsm, Strinsml
- 15.4.8 - Strlen
- 15.4.9 - Strlwr, Strlwrm,
Strupr, Struprm
- 15.4.10 - Strrev, Strrevm
- 15.4.11 - Strset, Strsetm
- 15.4.12 - Strspan, Strspanl,
Strcspan, Strcspanl
- 15.4.13 - Strstr, Strstrl
- 15.4.14 - Strtrim, Strtrimm
- 15.4.15 - Other String Routines
in the UCR Standard Library
- 15.5 - The Character Set Routines
in the UCR Standard Library
- 15.6 - Using the String Instructions
on Other Data Types
- 15.6.1 - Multi-precision Integer
Strings
- 15.6.2 - Dealing with Whole
Arrays and Records
- 15.7 - Sample Programs
- 15.7.1 - Find.asm
- 15.7.2 - StrDemo.asm
- 15.7.3 - Fcmp.asm
Copyright 1996 by Randall Hyde
All rights reserved.
Duplication other than for immediate display through a browser is prohibited
by U.S. Copyright Law.
This material is provided on-line as a beta-test of this text. It is for
the personal use of the reader only. If you are interested in using this
material as part of a course, please contact
rhyde@cs.ucr.edu
Supporting software and other materials are available via anonymous ftp
from ftp.cs.ucr.edu. See the "/pub/pc/ibmpcdir" directory for
details. You may also download the material from "Randall Hyde's Assembly
Language Page" at URL:
http://webster.ucr.edu
Notes:
This document does not contain the laboratory exercises, programming assignments,
exercises, or chapter summary. These portions were omitted for several reasons:
either they wouldn't format properly, they contained hyperlinks that were
too much work to resolve, they were under constant revision, or they were
not included for security reasons. Such omission should have very little
impact on the reader interested in learning this material or evaluating
this document.
This document was prepared using Harlequin's Web Maker 2.2 and Quadralay's
Webworks Publisher. Since HTML does not support the rich formatting options
available in Framemaker, this document is only an approximation of the actual
chapter from the textbook.
If you are absolutely dying to get your hands on a version other than HTML,
you might consider having the UCR Printing a Reprographics Department run
you off a copy on their Xerox machines. For details, please read the following
EMAIL message I received from the Printing and Reprographics Department:
Hello Again Professor Hyde,
Dallas gave me permission to take orders for the Computer Science 13 Manuals.
We would need to take charge card orders. The only cards we take are: Master
Card, Visa, and Discover. They would need to send the name, numbers, expiration
date, type of card, and authorization to charge $95.00 for the manual and
shipping, also we should have their phone number in case the company has
any trouble delivery. They can use my e-mail address for the orders and
I will process them as soon as possible. I would assume that two weeks would
be sufficient for printing, packages and delivery time.
I am open to suggestions if you can think of any to make this as easy as
possible.
Thank You for your business,
Kathy Chapman, Assistant
Printing and Reprographics
University of California
Riverside
(909) 787-4443/4444
We are currently working on ways to publish this text in a form other than
HTML (e.g., Postscript, PDF, Frameviewer, hard copy, etc.). This, however,
is a low-priority project. Please do not contact Randall Hyde concerning
this effort. When something happens, an announcement will appear on "Randall
Hyde's Assembly Language Page." Please visit this WEB site at http://webster.ucr.edu
for the latest scoop.
Art of Assembly Bug Report Submissions
Did you find an error in The Art of Assembly Language Programming?
You can let me know by using the form below to report the error to me so
that I can correct the error for the next beta version. Thank you.
The Submission Form
Please provide your name and e-mail address so I can contact you if
I have any questions regarding your submission.
Chapter 15 Strings and Character Sets
A string is a collection of objects stored in contiguous memory locations.
Strings are usually arrays of bytes, words, or (on 80386 and later processors)
double words. The 80x86 microprocessor family supports several instructions
specifically designed to cope with strings. This chapter explores some of
the uses of these string instructions.
The 8088, 8086, 80186, and 80286 can process two types of strings: byte
strings and word strings. The 80386 and later processors also handle double
word strings. They can move strings, compare strings, search for a specific
value within a string, initialize a string to a fixed value, and do other
primitive operations on strings. The 80x86's string instructions are also
useful for manipulating arrays, tables, and records. You can easily assign
or compare such data structures using the string instructions. Using string
instructions may speed up your array manipulation code considerably.
15.0 Chapter Overview
This chapter presents a review of the operation of the 80x86 string
instructions. Then it discusses how to process character strings using these
instructions. Finally, it concludes by discussing the string instruction
available in the UCR Standard Library. The sections below that have a "*"
prefix are essential. Those sections with a "o" discuss advanced
topics that you may want to put off for a while.
* The 80x86 string instructions.
* Character strings.
* Character string functions.
* String functions in the UCR Standard Library.
o Using the string instructions on other data types.
15.1 The 80x86 String Instructions
All members of the 80x86 family support five different string instructions:
movs
, cmps, scas, lods,
and stos[1]
.
They are the string primitives since you can build most other string operations
from these five instructions. How you use these five instructions is the
topic of the next several sections.
15.1.1 How the String Instructions Operate
The string instructions operate on blocks (contiguous linear arrays)
of memory. For example, the movs
instruction moves a sequence
of bytes from one memory location to another. The cmps
instruction
compares two blocks of memory. The scas
instruction scans a
block of memory for a particular value. These string instructions often
require three operands, a destination block address, a source block address,
and (optionally) an element count. For example, when using the movs
instruction to copy a string, you need a source address, a destination
address, and a count (the number of string elements to move).
Unlike other instructions which operate on memory, the string instructions
are single-byte instructions which don't have any explicit operands. The
operands for the string instructions include
- the
si
(source index) register,
- the
di
(destination index) register,
- the
cx
(count) register,
- the
ax
register, and
- the direction flag in the FLAGS register.
For example, one variant of the movs
(move string) instruction
copies a string from the source address specified by ds:si
to
the destination address specified by es:di
, of length cx
.
Likewise, the cmps
instruction compares the string pointed
at by ds:si
, of length cx
, to the string pointed
at by es:di
.
Not all instructions have source and destination operands (only movs
and cmps
support them). For example, the scas
instruction (scan a string) compares the value in the accumulator
to values in memory. Despite their differences, the 80x86's string instructions
all have one thing in common - using them requires that you deal with two
segments, the data segment and the extra segment.
15.1.2 The REP/REPE/REPZ and REPNZ/REPNE Prefixes
The string instructions, by themselves, do not operate on strings of
data. The movs
instruction, for example, will move a single
byte, word, or double word. When executed by itself, the movs
instruction
ignores the value in the cx
register. The repeat prefixes tell
the 80x86 to do a multi-byte string operation. The syntax for the repeat
prefix is:
Field:
Label repeat mnemonic operand ;comment
For MOVS:
rep movs {operands}
For CMPS:
repe cmps {operands}
repz cmps {operands}
repne cmps {operands}
repnz cmps {operands}
For SCAS:
repe scas {operands}
repz scas {operands}
repne scas {operands}
repnz scas {operands}
For STOS:
rep stos {operands}
You don't normally use the repeat prefixes with the lods
instruction.
As you can see, the presence of the repeat prefixes introduces a new field
in the source line - the repeat prefix field. This field appears only on
source lines containing string instructions. In your source file:
- the label field should always begin in column one,
- the repeat field should begin at the first tab stop, and
- the mnemonic field should begin at the second tab stop.
When specifying the repeat prefix before a string instruction, the string
instruction repeats cx
times[2].
Without the repeat prefix, the instruction operates only on a single byte,
word, or double word.
You can use repeat prefixes to process entire strings with a single instruction.
You can use the string instructions, without the repeat prefix, as string
primitive operations to synthesize more powerful string operations.
The operand field is optional. If present, MASM simply uses it to determine
the size of the string to operate on. If the operand field is the name of
a byte variable, the string instruction operates on bytes. If the operand
is a word address, the instruction operates on words. Likewise for double
words. If the operand field is not present, you must append a "B",
"W", or "D" to the end of the string instruction to
denote the size, e.g., movsb
, movsw
, or movsd
.
15.1.3 The Direction Flag
Besides the si, di, si
, and ax
registers,
one other register controls the 80x86's string instructions - the flags
register. Specifically, the direction flag in the flags register controls
how the CPU processes strings.
If the direction flag is clear, the CPU increments si
and di
after operating upon each string element. For example, if the direction
flag is clear, then executing movs
will move the byte, word,
or double word at ds:si
to es:di
and will increment
si
and di
by one, two, or four. When specifying
the rep
prefix before this instruction, the CPU increments
si
and di
for each element in the string. At completion,
the si
and di
registers will be pointing at the
first item beyond the string.
If the direction flag is set, then the 80x86 decrements si
and
di
after processing each string element. After a repeated string
operation, the si
and di
registers will be pointing
at the first byte or word before the strings if the direction flag was set.
The direction flag may be set or cleared using the cld
(clear
direction flag) and std
(set direction flag) instructions.
When using these instructions inside a procedure, keep in mind that they
modify the machine state. Therefore, you may need to save the direction
flag during the execution of that procedure. The following example exhibits
the kinds of problems you might encounter:
StringStuff:
cld
<do some operations>
call Str2
<do some string operations requiring D=0>
.
.
.
Str2 proc near
std
<Do some string operations>
ret
Str2 endp
This code will not work properly. The calling code assumes that the direction
flag is clear after Str2
returns. However, this isn't true.
Therefore, the string operations executed after the call to Str2
will
not function properly.
There are a couple of ways to handle this problem. The first, and probably
the most obvious, is always to insert the cld
or std
instructions immediately before executing a string instruction. The
other alternative is to save and restore the direction flag using the pushf
and popf
instructions. Using these two techniques, the
code above would look like this:
Always issuing cld
or std
before a string instruction:
StringStuff:
cld
<do some operations>
call Str2
cld
<do some string operations requiring D=0>
.
.
.
Str2 proc near
std
<Do some string operations>
ret
Str2 endp
Saving and restoring the flags register:
StringStuff:
cld
<do some operations>
call Str2
<do some string operations requiring D=0>
.
.
.
Str2 proc near
pushf
std
<Do some string operations>
popf
ret
Str2 endp
If you use the pushf
and popf
instructions to
save and restore the flags register, keep in mind that you're saving and
restoring all the flags. Therefore, such subroutines cannot return any information
in the flags. For example, you will not be able to return an error condition
in the carry flag if you use pushf
and popf
.
[1] The 80186 and later processor support two
additional string instructions, INS and OUTS which input strings of data
from an input port or output strings of data to an output port. We will
not consider these instructions in this chapter.
[2]
Except for the cmps
instruction which repeats at most the number
of times specified in the cx
register.
- 15.0 - Chapter Overview
- 15.1 - The 80x86 String Instructions
- 15.1.1 - How the String Instructions Operate
- 15.1.2 - The REP/REPE/REPZ and REPNZ/REPNE
Prefixes
- 15.1.3 - The Direction Flag
- 15.1.4 - The MOVS Instruction
- 15.1.5 - The CMPS Instruction
- 15.1.6 - The SCAS Instruction
- 15.1.7 - The STOS Instruction
- 15.1.8 - The LODS Instruction
- 15.1.9 - Building Complex
String Functions from LODS and STOS
- 15.1.10 - Prefixes and the
String Instructions
- 15.2 - Character Strings
- 15.2.1 - Types of Strings
- 15.2.2 - String Assignment
- 15.2.3 - String Comparison
- 15.3 - Character String Functions
- 15.3.1 - Substr
- 15.3.2 - Index
- 15.3.3 - Repeat
- 15.3.4 - Insert
- 15.3.5 - Delete
- 15.3.6 - Concatenation
- 15.4 - String Functions in the
UCR Standard Library
- 15.4.1 - StrBDel, StrBDelm
- 15.4.2 - Strcat, Strcatl, Strcatm,
Strcatml
- 15.4.3 - Strchr
- 15.4.4 - Strcmp, Strcmpl,
Stricmp, Stricmpl
- 15.4.5 - Strcpy, Strcpyl,
Strdup, Strdupl
- 15.4.6 - Strdel, Strdelm
- 15.4.7 - Strins, Strinsl,
Strinsm, Strinsml
- 15.4.8 - Strlen
- 15.4.9 - Strlwr, Strlwrm,
Strupr, Struprm
- 15.4.10 - Strrev, Strrevm
- 15.4.11 - Strset, Strsetm
- 15.4.12 - Strspan, Strspanl,
Strcspan, Strcspanl
- 15.4.13 - Strstr, Strstrl
- 15.4.14 - Strtrim, Strtrimm
- 15.4.15 - Other String Routines
in the UCR Standard Library
- 15.5 - The Character Set Routines
in the UCR Standard Library
- 15.6 - Using the String Instructions
on Other Data Types
- 15.6.1 - Multi-precision Integer
Strings
- 15.6.2 - Dealing with Whole
Arrays and Records
- 15.7 - Sample Programs
- 15.7.1 - Find.asm
- 15.7.2 - StrDemo.asm
- 15.7.3 - Fcmp.asm
Art of Assembly: Chapter Fifteen - 28 SEP 1996
[Next] [Art of Assembly][Randall
Hyde]